fix: screening/evaluating data quality and Cypher query bugs#53
Merged
fix: screening/evaluating data quality and Cypher query bugs#53
Conversation
…source in screening The screening skill was storing search snippets as abstract_text instead of the official abstract from fetch_patent. This was caused by: 1. No explicit instruction to use fetch_patent.abstract_text (not snippet) 2. No legal_status column in screened_patents table, causing judgment to conflate relevance assessment with legal status 3. Tests only verified fetch_patent was invoked, not that results were used Changes: - Add legal_status column to screened_patents table - Change judgment CHECK constraint to only allow relevant/irrelevant - Update screening SKILL.md to explicitly distinguish abstract_text from snippet and batch fetch patents (up to 10 in parallel) - Add CRITICAL warnings against using snippet as abstract - Update record-screening.md with legal_status parameter - Add test checks: all_patents_screened, legal_status_recorded, patent_fetch_invoked for both screening and evaluating - Update all test fixtures to include legal_status in INSERT statements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change from relationship pattern MATCH (p:Patent)-[:claims]->(c:claims) to direct node match MATCH (c:claims) to avoid c.text returning null - Remove ORDER BY toInteger(c.number) which also causes c.text null bug - Add batch parallel fetch (up to 10 patents) for performance - Add CRITICAL warnings documenting the Cypher parser bugs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
legal_statusfromjudgmentin schema, usefetch_patent.abstract_textinstead of searchsnippet, add batch parallel fetch (10 patents)MATCH (c:claims) RETURN c.number, c.textwithoutORDER BY, which causesc.textto return null due to a parser buglegal_status_recorded,all_patents_screened,patent_fetch_invokedin screening/evaluating testsRoot Cause
Two Cypher parser bugs in google-patent-cli were identified:
ORDER BY toInteger(c.number)causesc.textto returnexpression: null(p:Patent)-[:claims]->(c:claims)also causes nullWorkaround: use
MATCH (c:claims) RETURN c.number, c.text(no ORDER BY, direct node match)Test plan
screening/functional-with-data— PASS (101s)evaluating/functional— claims correctly retrieved with new query patternmise run test🤖 Generated with Claude Code